STATISTICAL SOFTWARE R IN CORPUS-DRIVEN RESEARCH AND MACHINE LEARNING

نویسندگان

چکیده

The rapid development of computer software and network technologies has facilitated the intensive application specialized statistical not only in traditional information technology spheres (i.e., statistics, engineering, artificial intelligence) but also linguistics. R is one most popular analytical tools for processing a huge array digitalized language data, especially quantitative corpus linguistic studies Western Europe North America. This article discusses functionality package R, focusing on its advantages performing complex analyses data corpus-driven creating classifiers machine learning. With this mind, three-stage strategy computer-statistical analysis elaborated: 1) preparing to be subjected procedure, 2) utilizing hypothesis testing methods (MANOVA, ANOVA) Tukey post-hoc test, 3) developing model classifier analyzing effectiveness. implemented 11 000 tokens English detached nonfinite constructions with an explicit subject extracted from BNC-BYU corpus. indicates significant differences realization factors parameter “Part speech subject”. analyzed are employed build classification given constructions. Particular attention devoted methodological perspectives interdisciplinary research fields linguistics studies. potential elaborated case study training undergraduate, master, postgraduate students Applied Linguistics indicated. provides all codes written script comprehensive descriptions explanations. concluding part summarizes obtained results highlights issues further connected popularization raising awareness specialists system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Software Effort Prediction using Statistical and Machine Learning Methods

Accurate software effort estimation is an important part of software process. Effort is measured in terms of person months and duration. Both overestimation and underestimation of software effort may lead to risky consequences. Also, software project managers have to make estimates of how much a software development is going to cost. The dominant cost for any software is the cost of calculating...

متن کامل

Statistical and Machine Learning

• Course plan: See Table of Contents (tentative). We will emphasize on knowing " why " and on statistical aspects instead of algorithms and programming. But still you have to know " how " by either writing your own implementation or modifying from others' code. • Grading policy: homework 30%, score for late homework= (full points) × 0.8 d , d : delay days. oral presentation 20% on assigned task...

متن کامل

A Machine Learning Approach for Statistical Software Testing

Some Statistical Software Testing approaches rely on sampling the feasible paths in the control flow graph of the program; the difficulty comes from the tiny ratio of feasible paths. This paper presents an adaptive sampling mechanism called EXIST for Exploration/eXploitation Inference for Software Testing, able to retrieve distinct feasible paths with high probability. EXIST proceeds by alterna...

متن کامل

Semantics-Driven Statistical Machine Translation

Semantic parsing, the task of mapping natural language sentences to logical forms, has recently played an important role in building natural language interfaces and question answering systems. In this talk, I will present three ways in which semantic parsing relates to machine translation: First, semantic parsing can be viewed *as* a translation task with many of the familiar issues, e.g., dive...

متن کامل

mlr: Machine Learning in R

The mlr package provides a generic, object-oriented, and extensible framework for classification, regression, survival analysis and clustering for the R language. It provides a unified interface to more than 160 basic learners and includes meta-algorithms and model selection techniques to improve and extend the functionality of basic learners with, e.g., hyperparameter tuning, feature selection...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Information Technologies and Learning Tools

سال: 2021

ISSN: ['2076-8184']

DOI: https://doi.org/10.33407/itlt.v86i6.4627